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SUMMARY 

This is a final report on the research work supported by the RNR NAS at NASA Ames 
Research Center under Grant NAG 2-827, Massively Parallel and Scalable Implicit 
Time Integration Algorithms for Structural Dynamics. 


1. Motivations and research plan 

Explicit codes are often used to simulate the nonlinear dynamics of large-scale struc- 
tural systems, even for low frequency response, because the storage and CPU requirements 
entailed by the repeated factorizations traditionally found in implicit codes rapidly over- 
whelm the available computing resources. With the advent of parallel processing, this 
trend is accelerating because of the following additional facts: (a) explicit schemes are 
easier to parallelize than implicit ones, and (b) explicit schemes induce short range inter- 
processor communications that are relatively inexpensive, while the factorization methods 
used in most implicit schemes induce long range interprocessor communications that often 
ruin the sought-after speed-up. However, the time step restriction imposed by the Courant 
stability condition on all explicit schemes cannot yet be offset by the speed of the currently 
available parallel hardware. Therefore, it is essential to develop efficient alternatives to 
direct methods that are also amenable to massively parallel processing because implicit 
codes using unconditionally stable time-integration algorithms are computationally more 
efficient when simulating the low-frequency dynamics of aerospace structures. 

We have proposed to develop, under the NASA Research Announcement NRA2- 
35250(JLB), a massively parallel scalable methodology for large-scale implicit transient 
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computations in structural mechanics that requires significantly less storage than factor- 
ization algorithms, that is several times faster than other popular direct and iterative 
solvers, which can be easily implemented on both shared and local memory parallel pro- 
cessors, and which is both computationally and communication- wise efficient. The key 
ingredients of this methodology will be: (a) a novel unconditionally stable time integra- 
tion algorithm for hybrid substructuring methods, (b) a domain decomposition method 
based on a hybrid variational principle and featuring a massively parallel and numerically 
efficient preconditioner, and (c) a mesh partitioning algorithm for implicit computations 
that optimizes a compromise between load balancing and communication costs. 

More specifically, we have proposed three tasks to be completed during a three-years 
research program: 

TASK 1: the design of an unconditionally stable and second order accurate parallel 

implicit time-integration scheme that is based on the FETI domain decom- 
position methodology developed by the PL 

TASK 2: the development of a scalable parallel preconditioner for problems with a 

large number of subdomains; for these problems, the spectrum of the inter- 
face operator with the “lumped” preconditioner previously developed is such 
that superconvergence conditions are not met. 

TASK 3: the development of a two-level mesh partitioning strategy that would allow 

controlling the growth of the condition number of the interface problem by 
keeping the number of subdomains relatively small — and therefore, the 
subdomain aspect ratio close to unity — , without reducing the degree of 
parallelism of the domain decomposition method. Toward the end of the 
first funding year, we have found that the same objectives could be better 
achieved via the design of a coarsening operator for dynamics problems that 
would propagate the error globally, control the condition number associated 
with fine mesh partitions, and therefore accelerate convergence. 

After several discussions with our first grant monitor, Dr. Eddy Pramono, the follow- 
ing two tasks were added: 

TASK 4: the tuning of the parallel domain decomposition method for problems with 

multiple and/or repeated right hand sides in order to solve efficiently linear 
transient problems such as those encountered in aeroelastic dynamic compu- 
tations. 

TASK 5: the improvement of the performance of the FETI domain decomposition 

method for heterogeneous plate and shell substructures such as those en- 
countered in stiffened aircraft wings. 
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2. Progress history 

2.1. TASK 1 

TASK 1 was completed during the first year of the grant. 

2.2. TASK 2 

During the first funding year, we have developed a scalable parallel preconditioner 
based on a force/displacement interpretation of the FETI methodology, and have imple- 
mented it on the iPSC-860 parallel processor. This preconditioner is optimal in the sense 
that it ensures a performance that is independent of the mesh size. However, it is more 
expensive than the original lumped preconditioner and requires more storage. The relative 
performance of both preconditioners is problem dependent, and machine dependent in the 
sense that memory can be a limitation for the optimal preconditioner. However, both 
preconditioners outperform a direct skyline solver. 

During the second funding year, we have coupled both lumped and optimal pre- 
conditioners with the coarse grid operator developed for dynamic problems to ensure a 
performance that is independent of the number of subdomains. We have also analyzed the 
effect of the subdomain aspect ratio on the convergence rate of the preconditioned FETI 
method and developed a fast optimization algorithm for improving the aspect ratio of ex- 
isting mesh partitions. We have shown that the new optimization algorithm can improve 
the solution time of the FETI method factors greater than 1.6. 

During the third funding year, we have improved both lumped and optimal precondi- 
tioners to solve efficiently heterogeneous plate and shell structures. 

2.3. TASK 3 

During the second and third funding years, we have developed a new efficient and 
scalable domain decomposition method for solving implicitly linear and nonlinear time- 
dependent problems in computational mechanics. The method is derived by adding a 
coarse problem to the transient FETI substructuring algorithm developed during the first 
funding year in order to propagate the error globally and accelerate convergence. We have 
proved that in the limit for large time steps, the new method converges toward the FETI 
algorithm for time-independent problems. We have reported computational results that 
confirm that the optimal convergence properties of the time-independent FETI method 
axe preserved in the time-dependent case. We have also presented an iterative scheme for 
solving efficiently the coarse problem on massively parallel processors, and demonstrated 
the effective scalability of the new transient FETI method with the large-scale finite element 
dynamic analysis on the Paragon XP/S system of several diffraction grating finite element 
structural models. We have shown that for sufficiently large problems and/or fine mesh 
partitions, the new domain decomposition method outperforms both the original one and 
the popular direct skyline solver. 
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2.4. TASK 4 


During the second funding year, we have also developed a methodology for extending 
the range of applications of domain decomposition methods to problems with multiple or 
repeated right hand sides. Such problems arise, for example, in multiple load static analy- 
ses, in implicit linear dynamics computations, in the solution of nonlinear problems via a 
quasi-Newton scheme, in various structural eigenvalue problems, and in the iterative solu- 
tion of the FETI coarse grid problems. Basically, we have formulated the overall problem 
as a series of minimization problems over /i -orthogonal and supplementary subspaces, and 
tailored the preconditioned conjugate gradient algorithm to solve them efficiently. The re- 
sulting solution method is scalable, whereas direct factorization schemes and forward and 
backward substitution algorithms are not. We have illustrated the proposed methodology 
with the solution structural dynamic problems, and highlighted its potential to outperform 
forward and backward substitutions on the iPSC-860 and Paragon XP/S computers. Of 
particular importance is the impact of this methodology on the scalable parallel solution 
of coarse grid problems. 

During the third funding year, we have extensively benchmarked the new transient 
FETI solver resulting from TASK 3 and TASK 4 and applied it to the massively parallel 
solution of realistic aeroelastic simulations. We have also transferred this technology to 
several aerospace companies and finite element software houses including Lockheed and 
Centrics. 

2.5. TASK 5 

Numerical experiments have shown us that for stiffened shell problems such as those 
encountered in aircraft wing structures, and/or problems with heterogeneous substruc- 
tures (jumps in the material properties), the FETI method does not perform as well as 
for smoother problems. Indeed, shell problems are related to the biharmonic operator, 
and therefore are more ill-conditioned than elasticity problems which are related to the 
Laplacian operator. Moreover, jumps in the material properties across the substructure 
interfaces require a different redistribution of the interface tractions and jumps than cur- 
rently done in the FETI method. Addressing both issues would improve the performance 
of the FETI method for extremely difficult problems and enhance its robustness. 

During the second and third years of funding, we have addressed mainly the heteroge- 
neous substructures problem; we have devised a smoothing operator for the FETI method 
that improves its convergence rate when applied to these difficult problems encountered 
in wing-box structures. This smoothing operator has been validated for model problems. 
During the third year of funding, we have further developed this smoother for realistic 
structural models and initiated its integration it in the full FETI code. Finally, we have 
also augmented the FETI method with corner modes in order to handle more efficiently 
plate and shell problems. 
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able Parallel Coarse Solvers, and Global/Local Analysis,” Domain-Based Parallelism 
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by D. Keyes, Y. Saad and D. G. Truhlar, SIAM, pp. 141-160 (1995) 
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12. C. Farhat, F. Hemez, and J. Mandel “Improving the Convergence Rate of a Transient 
Substructuring Iterative Method Using the Rigid Body Modes of its Static Equiva- 
lent,” AIAA Paper 95-1271, AIAA 36th Structural Dynamics Meeting , New Orleans, 
Louisiana April 10-13 (1995) 

13. C. Farhat, L. Crivelli and F. X. Roux, “A Transient FETI Methodology for Large- 
Scale Parallel Implicit Computations in Structural Mechanics,” International Journal 
for Numerical Methods in Engineering , Vol. 37, pp. 1945-1975 (1994) 

14. C. Farhat, L. Crivelli and F. X. Roux, “Extending Substructure Based Iterative Solvers 
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4. Technology transfert 

4.1. TOP/DOMDEC 

The TOP/DOMDEC software package for mesh partitioning and parallel processing 
is currently used in many places in both industry and government laboratories including 
IBM, SGI, Lockheed, Ford Motors, Centrics, the Livermore National Laboratories, and 
MCNC. 

4.2. FETI 

The FETI solvers have been implemented in production codes at Lockheed and Ford 
Motors, and are currently being examined by commercial finite element software houses. 

4.3. Projection based preconditioners 

The projection based preconditioners and techniques for solving iteratively systems 
with multiple/repeated right hand sides have found their way in the Spectrum code of 
Centrics. 
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